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ABSTRACT 

We have developed an algorithm that, starting from the observed properties of the X- 
ray spectrum and fast variability of an X-ray binary allows the production of synthetic 
data reproducing observables such as power density spectra and time lags, as well as 
their energy dependence. This allows to reconstruct the variability of parameters of 
the energy spectrum and to reduce substantially the effects of Poisson noise, allowing 
to study fast spectral variations. We have applied the algorithm to Rossi X-ray Timing 
Explorer data of the black- hole binary Cygnus X-l, fitting the energy spectrum with 
a simplified power law model. We recovered the distribution of the power law spectral 
indices on time-scales as low as 62 ms as being limited between 1.6 and 1.8. The index 
is positively correlated with the flux even on such time-scales. 

Key words: X-rays: binaries - X-rays: individual: Cygnus X-l - methods: statistical 
- methods: data analysis 



1 INTRODUCTION 

Black hole binaries (BHB) exhibit considerable X-ray vari- 
ability on a wide range of time-scales. The study of X-ray 
fast time variability has become an important astrophysical 
research tool that helps us gain better insight into the physi- 
ca l process at work n ear the black hole (see the recent review 
of Ivan der Klia 2004). For instance, the dynamic time-scale 
for the motion within a few Schwarzschild radii of a 10 Mq 
black hole is at the order of milliseconds. Further consider- 
ing that most of the gravitational energy of accretion matter 
is released in the inner area of a few Schwarzschild radii, the 
variability at short time-scales can be used to probe the 
accretion-flow dynamics and geometries within the strong- 
field region. 

Time variability can be studied in the time domain or 
in the frequency domain. The latter is based on the Fourier 
transform (FT) and usually is based upon two basic tech- 
niques, the Power Density Spectrum (PDS) and the time lag 
spectrum. The square of Fourier transform amplitudes as a 
function of Fourier frequency constitutes the PDS, which 
provides the estimate of variance at different frequencies. 
The time lag spectrum is obtained from the phase lag, i.e. 
the phase angle difference between the Fourier vectors at 
different energy channels. In practice, the PDS and the lag 
spectra are usually averaged over many segments of obser- 
vation and frequencies in order to increase the statistical 
significance. The Fourier transform is reversible: the time 
series can be reconstructed from its Fourier transform by 
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means of Inverse Fourier transform (IFT). On the contrary, 
the PDS is not reversible, since the phase information in the 
FT is lost. In principle there is an infinite variety of different 
signals that will yield the same PDS. 

The fast variability observed from BHB is of stochas- 
tic nature and as such cannot be modeled directly. In other 
words, it is not possible to reproduce the exact observed 
variations. The aim of time-series analysis is to character- 
ize the average properties that give rise to the fluctuations, 
under the assumption that the process is stationary. A suc- 
cessful model should reproduce the PDS and the lag spec- 
trum, as well as other sta tistical properties of the signal 
(see, e.g. lUttlev et a l. 2005). A conventional model describ- 
ing the temporal fluctuati on is the shot-noise model (|Terrelll 
Il972l ; iNegoro et al.l ll994T ). It has become clear, however, 
that in this framework complex shot profiles or distribu- 
tions of shot durations and amplitudes ha ve to be assumed 
to model the variability of BHBs (e.g. [M iyamot o et al.l 
1 19881 : iBelloni fc Hasingerl Il990t lLochner et all Il99ll h An 
alternative way is to apply Linear State Space Models 
(LSSMs) which are based on stochastic processes, or au- 
toregres sive (AR) processes to describe the temporal var i- 
ability (iKonig fc Timmerl 1 19971 : iPottschmidt et all 1 1998( 1 
lUttlev et all (2005J) use a non-linear model to explain the 
lognormal flux distribution and rms-flux relation. All these 
models are phenomenological; based on the PDS alone they 
try to reproduce the observed properties through a mathe- 
matical model, which can pr ovide constraints on phy sical 
models. On the other hand, lArevalo fc Uttlevl (|2006l ) at- 
tempted a more physically-constrained generating process 
to model all the spectral-timing properties simultaneously. 
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The usual course of action is to extract information in 
the frequency domain, such as a PDS, from the time se- 
ries. However, sometimes we need to do the opposite: to 
reconstruct the time series from the PDS. The simulation of 
random time series with arbitrary PDS has a Ions and es- 
tablis hed history (see e.g. iDavis et al.lll98ll : I Liu fe Munsonl 
Il982f ) in the field of digital signal processing. There are also 
papers on simulating time series with spe cific marginal dis- 
tribution, e.g. lognormal II Johnson! 1 1994T ). In astrophysical 
research the reasons and benefits to perform such a recon- 
struction are various. Sometimes it provides a more direct 
tool to .judg e the models or s imulation methods (for an 
example, see iTimmer fe Koniel Il995l ). Also it can be used 
to estimate the e rror bars by Monte Carlo simulation (e.g. 
iDone et al.lll992f ). Since the PDS does not contain the phase 
information, in order to do so one must assign values to 
the phases as the reconstruction based on the PDS alone is 
not unique. One easy way to generate data that reproduce 
a given PDS is to choose the Fourier amplitude accord- 
ing to the PDS a nd assign random phases between [0, 2-k] 
IjDone et al.lll992T) . Based on the theory of linear stochastic 
process and the fact that the PDS its elf follows a chi-square 
distribution. ITimmer fc Koniel (119950 p r opose d a algorithm 
practically identical to lDavies fc Hartd (| 19871 ) — to produce 
the whole variety of possible non-deterministic linear time 
series from the PDS by randomizing both the phases and 
amplitudes. Some authors use the energy- resolved PSDs in- 
stead of the total PSD in the method of ITimmer fc Koniel 
(1995), and shift the phase to yield li ght curves with de - 
sired lag function between them (e.g., IZoghbi et al.ll201(J ). 
Recently, the non-linear behavior of observed light curves 
was studied bv lUttlev et al.l ([2005). They suggested that an 
additional exponential transform needs to be applied to the 
time series created with the method of ITimmer fc Koniel 
l|l995h . 

BHBs are known to exhibit X-ray spectral evolution on 
short time-scales. This evolution is reflected in the presence 
of lags between the light curves at different energy ranges 
and asymmetries of the cross-correlation function between 
them, as well as fast variations of the corresponding hard- 
ness ratio. However, the conventional spectral models ap- 
plied to these systems are designed to fit the energy spec- 
trum averaged over, usually, several thousand seconds. Be- 
cause of limited statistics, it is not possible to follow the 
energy spectrum over the short time-scales corresponding 
to the observed fast variability. 

In this work we propose a new technique that simulates 
a time series starting from the actual intensity measure- 
ments. Specifically, we simulate the light curves in different 
energy bins reproducing all the properties observed in the 
real data: the average energy spectrum, the PDS as a func- 
tion of energy and the the frequency-dependent lag spectra 
between different energy bands. In the simulation, no Pois- 
son noise is of course introduced. With the simulated "clean" 
light curves in different energy bins, we can study the varia- 
tions of the energy spectrum on short time-scales. Our work 
is improved (or different) in three aspects when compared 
with the papers mentioned above. For the first, besides the 
PDS, we make use of other measurements, including energy 
spectra and lag spectra as input to the simulation. Secondly, 
we require that the simulated time series should reproduce 
almost all the timing and spectral properties. At last we 



explore the possible application of the method, including 
filtering Poisson noise and data extrapol atio n. Our work, 
based on those by ITimmer fc Koniel |l995j) and lUttlev et al.l 
(2005 ), can be seen as a c ontinuation of them. In contrast 
to lArevalo fc Uttlevl (|2006T ) ■ our work is model- independent 
and aims at developing an algorithm that recovers the time 
series preserving the information and filtering the noise. 

The article is organized as follows: the preparatory 
works of data analysis and parameter estimation are intro- 
duced in section [2] including the non-linearity study (sec- 
tion [271), energy dependence of PDS (section |2.2[) . lag spec- 
tra (section 12. 3[1 , coherence function (section I2.4|l and dis- 
tribution of the phases (section |2.5[) . The important results 
are presented in section[3] In section [3~T1 the algorithm is de- 
fined by steps. The simulated light curve is compared with 
the observed one (section l3.2|l and the issue of Poisson noise 
subtraction is discussed (section |3.3[) . As one important ap- 
plication, the energy spectra on short time-scales (dynamic 
energy spectra) are derived and the variation of spectral 
shape is investigated in section l3~4l Another possible appli- 
cation, the simulation of data with better time and energy 
resolution than observations can be measured directly from 
the observations, is discussed in section 13.51 The issues as- 
sociated with phase and noise are discussed in section [ 
The conclusions are presented in section [4] 



2 DATA ANALYSIS AND PARAMETER 
ESTIMATION 

The central idea of the algorithm is to synthesize light curves 
that reproduce all observed properties both in the time and 
frequency domain. These are: 

• the average count rate in different energy bands both in- 
tegrated (spectrum) and as a function of time (light curves) 

• the relation between root mean square variability and 
count rate in the light curves at different energies (rms-flux 
relation) 

• the shape and normalization of the PDS as a function 
of energy 

• the phase/time-lag spectrum as a function of energy 

Once these properties are extracted from the data, syn- 
thetic data curve can be constructed to reproduce the same 
results. 

A preliminary step is to obtain the above quantities 
with good accuracy for a black-hole binary. We used a single 
observation of Cygnus X-l from the Proportional Counter 
Array (PCA) on board the Rossi X-ray Timing Explorer 
(RXTE) (ObsID 10238-01-05-000). Cyg X-l is the first- 
discovered black hole binary and has been studied for several 
decades. Its brightness and persistence make it a perfect tar- 
get for X-ray timing research. The observation was carried 
out in March, 1996, when Cyg X-l was in its hard spectral 
state , the most common for the source (e.g. IWilms et al.l 
2006). The data configuration used here is the generic PCA 
binned mode B_16ms_64M_0_249, which provides high res- 
olution in both energy (64 channels over the full 2-60 keV 
PCA band) and time (16 msec). This is the main reason 
why we selected this particular observation, which also has 
the advantage of having been made with all five propor- 
tional counter units of the PCA, increasing the source count 
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rate. In order to obtain sufficient statistics for the analysis, 
we rebinned the data into eight energy bins between 0.14- 
25 keV. In this way in each energy bin the mean count rate 
is above ~ 1000 counts s _1 . Notice that the contribution 
of background photons to each of these eight bins is minor 
compared to the source counts. 

For the analysis, we used custom-made software written 
in the IDL environment. For each of the eight energy bins, we 
extracted a Fourier Spectrum from data stretches of length 
128 s, up to a Nyquist frequency of 32 Hz. These Fourier 
spectra were averaged for constru cting the PDS (normalized 
to the squared fractional rms, see lBelloni fc Hasingerlll990l ) 
and the phase-lag spectra. The Poiss on contribution wa s 
subtracted by using RXTE recipes (see lZhang et alj|l9950 . 



2.1 Rms-Flux relation 



lUttlev et al.l (J2005J) showed that the rms-flux relation, the 
non-linear behavior and the lognormal flux distribution ob- 
served in the hard state of BHB represent three different 
aspects of the same underlying process. This non-linearity 
can be reproduced as an exponential of a linear light curve. 
As the plan is to simulate the source light curve in differ- 
ent energy bands, we must first check that this non-linearity 
holds also for separate energy band. We then produced the 
rms-flux relation and the flux (count rate) distribution for 
each of our eight energy bins (see Fig. [l] and Fig. [2j . 

The rms in Fig. [l] was measured by integrating the PDS 
of the eight bins over the 1-32 Hz frequency interval for 1 s 
segments. Its relation with the count rate, also in 1-s bins, 
is consistent with linearity for all bins. This linear relation 
holds also when the length of the light curve segments and 
the frequency range for the integration are changed. At the 
same time, the flux distribution (with a time bin of 0.25 s 
in Fig. [2]) fits a lognormal model. Further subdividing the 
energy range into narrower bins we did not find significant 
deviations. 

As shown by lUttlev et al.l (J2005J), for typical observed 
light curves with the PDS dominated by broad components 
and a fractional rms of 20-40 per cent, the distorting effect 
of the exponential transformation on the shape of the PDS 
is relatively small. A quantitative analysis (not presented 
here) suggests that for Lorentzian-shaped PDS components 
with fractional rms smaller than 50 per cent, the distortion 
is not serious if the quality factor \j Q < 2. We tested the 
distortion on the PDS caused by exponential transformation 
of data in the time domain: we calculated the PDS from the 
logarithm of the real data and compared it with the orig- 
inal one (Fig. [3]). The PDS shapes are almost unchanged 
between the raw data and the logarithmically transformed 
data. Therefore in our simulation below, it is justified to use 
the observed PDS as the PDS of the input linear light curve 
without the need to correct the distortion effect of the ex- 
ponential transformation. However, it is still important to 
apply the appropriate correction to the normalization of the 
input PDS in order to return the desired variance in the 
output light curve, because the exponential transformation 
will cause the increase of the light curve variance. Also the 

The quality factor is the ratio between the Lorentzian centroid 
frequency and the full width at half maximum (FWHM). 




Frequency (Hz) 



Figure 3. The PDS of the raw data (solid line) and the loga- 
rithmically transformed data (dot). The latter is renormalizcd to 
be directly compared with the former. The Poisson noise is not 
subtracted from the PDS. 



mean count rate of light curve would change after the expo- 
nential transformation, and a correction factor needs to be 
multiplied to the non-linear light curve. 

2.2 Energy dependence of the PDS 

We extracted an average PDS from each of the eight energy 
bins, covering the frequency range 0.008-32 Hz (Fig. Q. 
No narrow QPOs are seen. A simple model consisting of 
two broad Lorentzians was used for the fit. The goodness 
of the fit is reasonably good, with all reduced x 2 smaller 
than 2.2 (obtaining a formal reduced % 2 of the order of 
unity is difficult for these high-signal PDS). Adopting a 
more complex model (e.g. iBelloni et al.l Il997l ) the good- 
ness of fit would be improved but the fit parameters would 
be poorly constrained. There are three free parameters for 
each Lorentzian: the normalization (the square of the inte- 
grated fractional rms), the centroid frequency and FWHM. 
The evolution of the PDS shape with energy can be well 
described by the energy dependence of the best-fitting pa- 
rameters, which is shown in Fig. [5] Concerning the second 
Lorentzian, for the last three energy bins the best-fitting 
centroid frequency decreases to zero, the lower bound of this 
free parameter. For those three bins, uncertainties were not 
plotted. 



2.3 Energy dependence of the time lag spectrum 

The time lag of the light curve in each energy bin relative to 
that of the lowest energy bin (0.14-3.4 keV) was calculated 
from the cross-spectra between 0.06 and 30 Hz. Positive lags 
here correspond to the hard time series lagging the soft. The 
lags were logarithmically rebinned in frequency in order to 
reduce noise. Since their calculations involve the splitting 
of the data into two energy bands, compared with the PDS 
the meas urement of time lag s is more sensitive to count- 
ing noise. iNowak et al.l (1 19991 ) estimated the expected noise 
level for the time lag measurements and concluded that for 
frequencies below ~ 0.1 Hz and above ~ 30 Hz lags cannot 
be measured because of noise limitations. As the frequency 
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Figure 1. The rms-flux relations for 8 energy bins. They are produced by binning the 1-32 Hz rms measured in 1 s segments into flux 
bins. The dashed line is the best-fitting linear model for each plot. 
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approaches ~ 0.1 Hz or ~ 30 Hz, the lags tend to zero due 
to the effect of noise. When sampling fluctuations become 
comparable to the intrinsic lags, they scatter around zero 
and exhibit negative values. 

We adopted the same strategy as for the PDS to quanti- 
tatively desc ribe the energy depe ndence of lag spectrum in a 
uniform way. iNowak et al.l (|1999T ) showed that the time lags 
approximately show a power law dependence upon frequency 
(oc /~ 0,7 ). We found significant deviations from a simple 
power law model. The time lag spectra show a two-humped 
shape similar to that published in p r evious studies (e.g ., 
iMivamoto et al.lll992l ; ICui et al.lll997l ; INowak et al"1ll999h . 
We used a two-Lorentzian model to fit the time lag spectra. 
The time lag spectra with the best-fitting two-Lorentzian 
models are shown in Fig. [S] Because the negative lags have 
small values and appear in the frequency range where the 
noise level dominates, they are not expected to be intrin- 
sic. The negative lags were therefore excluded from the fit. 
The evolutions of the best-fitting parameters with energy 
are shown in Fig. [7] 



2.4 Coherence function 

The coherence function is a Fourier frequency- 
dependent measure of the linear correlation between 
time series measured simult aneously in two energy 
bands (jVaughan fc Nowaklll997r ). Our simulation does not 
include any incoherent variability, i.e. variations in one 
energy band that are not correlated with variations in other 
bands. In other words the algorithm contains an underlying 
assumption of a single emission component in different 
energy bands with a single delay at a given frequency and 
unity coherence. We calculated the coherence function of 
Cyg X-l data with correc tion for counting no i se, fol lowing 
the recipe presented in IVaughan fc Nowakl (| 19971 ^) . The 
results are shown in Fig. which demonstrate a remarkably 
high coherence (close to unity) over a wide frequency range 
and consistent with previous coherence study of Cyg X-l 
(e.g.lVaughan fc Nowaklll997l ; ICui et al.|[l997l ; INowak et all 
1999). Therefore, we can say that below ~10 Hz, the flux 
in each energy band can be regarded as originating from 
one single coherent component, whose intrinsic phase delay 
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Figure 6. Time lag spectra for various energy bins versus the lowest energy bin (0.14-3.4 keV). Dots represent the positive lags (hard 
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model for the PDS at different energies. The left and right 
columns correspond to the two Lorentzian components. The pa- 
rameters are (from top to bottom) normalization, centroid fre- 
quency and FWHM. The circles are the interpolation values be- 
tween energy bins (see section [3.5|l . 



is indicated by the lag spectrum. The coherence becomes 
slightly lower at higher frequencies, which may indicate 
the presence of incoherent components. We do not attempt 
to add them into the simulation because it is a laborious 



and model-dependent process and beyond the scope of this 
work. 
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Figure 8. The coherence function for various energy bins versus 
the lowest energy bin (0.14-3.4 keV). 



2.5 Phase 

In order to reconstruct the time series from the PDS and 
time lag spectra, a prior phase distribution has to be as- 
sumed, since the PDS does not contain phase information. 
We first analyzed the real data in order to derive a reason- 
able phase distribution. We split the 0.015625-s binned light 
curve into 694 segments, each with a length of 1024 points 
(16 seconds). For each segment, we produced a Fourier trans- 
form, leading to 694 values of the phase angle for each fre- 
quency between 0.0625 Hz and 32 Hz. Obviously, the phases 



between separate segments are comparable only after con- 
sidering the additional phase shift caused by the time-delay 
between their start time. If ifij.i is the phase angle at fre- 
quency fj of the «th segment, to,i and £n,i are the start times 
of the ith segment and the 1st segment respectively, the "ab- 
solute" phase at this frequency for the ith segment can be 
calculated as 

fiti = fit + 27r/i(to,i — *o,i)- 

In Fig. [9] we plot the phase at different frequencies for 
the first segment (panel a) , the phase at a certain frequency 
for different segments (panel 6) and its histogram of occur- 
rence (panel d). Moreover, we studied the auto correlation 
function of the phase at different frequencies (shown in panel 
c). All these results clearly show that the phase follows a uni- 
form distribution between — n and 7rand the phases at differ- 
ent frequencies are random and independent. Therefore, in 
our synthesis algorithm we generate uniformly distributed 
random numbers in the interval (— 7r,7r] as phase angles for 
the Fourier transform. 



3 RESULTS 

3.1 The algorithm 

Having obtained the energy-resolved PDS (with Poisson 
noise subtracted) and the time lag spectra for our eight en- 
ergy bins, and the average energy spectrum, we followed the 
following procedure to generate a synthetic light curve: 

• Step 1: for the lowest energy bin (1-bin), we generated 
uniformly-distributed random numbers between (—it, it] to 
be used as phase angle <p(fi) at Fourier frequency fi(fi ^ 0). 
In order to obtain real values for the time series, we chose the 
phase for the negative frequencies as <p(—fi) — — ¥>(/»)■ Fb r 
the energy bins 2 through 8, the phases were reconstructed 
from the 1-bin values according to the measured phase lag 
spectra which provide the phase shift relative to the lowest 
energy bin at each frequency. 

• Step 2: the amplitude of the Fourier transform at each 
frequency was obtained from the PDS. In order to account 
for the effects of the exponential transformation to the vari- 
ance of light curve, the PDS should first be renormalized 
in order to obtain the desired variance. For a PDS P(f) 
in units of (rms/mean) 2 Hz -1 and with frequency bin size 
A/, the desired fractional rms is R 2 = EP(/)A/. The PDS 
mu st be multiplied by a factor of log(R 2 + 1)/R 2 (for details 
see lUttlev et al.ll2005l ). The square root of the renormalized 
PDS is the amplitude of the Fourier transform A(fi). The 
series need also be expanded to negative frequencies with 
A(-/i)=^(/i). 

• Step 3: for each energy band, we calculated the in- 
verse Fourier transform of A(fi) exp(j(p(fi)) (where j is 
the imaginary unit) to obtain the linear time series l(t), 
and then calculated its exponential. In order to ensure that 
the simulated light curve has the desired mean count rate 
(7(2?) measured in the average energy spectra, a factor of 
(7(23) log(i? 2 + l)/R 2 needs to be multiplied to exp[l(t)]. 

• Step 4'- the time series obtained with the previous steps 
was stored as one light-curve segment. Steps of 1-3 were 
then repeated to produce multiple segments. 
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Figure 9. (a) The phase at different frequencies for the 1st segment; (b) the phase at 0.0625 Hz for 694 segments; (c) the auto correlation 
function of phase at different frequencies for the 1st segment; (d) the histogram of phase at 0.0625 Hz. 



3.2 Test of the Simulation 

To check whether the simulated light curve replicates all the 
observed properties of original real data, we compared their 
PDS, time lag spectra, rms-flux relation and lognormal flux 
distribution. The comparison relative to the energy bin 5.5- 
6.8 keV is shown in Fig. \W\ as an example. The simulated 
light curve of course cannot have exactly the same evolution 
as the real one, as a random-number input is involved, but 
they appear to be similar in the amplitude and time-scale 
of variance. The PDS, the time lag spectrum, the rms-flux 
relation and the flux distribution are consistent with those 
from the real data, showing that our simulation reproduces 
accurately the intrinsic properties of the real data. In other 
words, our algorithm can synthesize data whose statistical 
properties are indistinguishable from those observed from 
Cyg X-l. 



3.3 Poisson Noise Subtraction 

The expected influence of Poisson fluctuations in the time 
series is represented as a white noise component in the PDS. 
Since it is independent of the source signal (apart from 
dead-time effects), the Poisson noise can be considered as 
a "background" component in the PDS, against which we 
try to observe other features caused by the intrinsic vari- 
ability of X-ray source. If the light curve is a series of con- 
tiguous time bins, the expected P oisson noise level i s simply 
2 for the "Leahy" normalization (jLeahv et al.lll985T) . In this 
work the PDS is normaliz ed in units of (rms/mean) 2 Hz -1 
(jBelloni fc Hasingerlll990t ). and the expected Poisson noise 
level in the PDS is given by 

_ 2(C + B) 

1 noise — -~. 

C- 1 



where C and B are the mean sourc e count rate and back - 
ground count rate, respectively. See IVaughan et alj (|2OO30 
for more details about the different normalizations of PDS 
and the corresponding Poisson noise levels. Notice that the 
shape and level of the Poisson noise contribution to the PDS 
are modified by d ead-time effects (for the RXTE/PCA, see 
IZhanget ai]|l995l ). 

Therefore, we can easily subtract the Poisson noise level 
from the PDS, and obtain the "clean" light curve without 
Poisson noise with our algorithm. In other words, our syn- 
thetic algorithm can be used as a filter of Poisson noise. We 
can check this by adding Poisson fluctuations to an initial 
simulated "clean" light curve, then filter the "dirty" light 
curve with our algorithm to see whether the filtered light 
curve resembles the initial one or not. Because the algorithm 
can not repeat the exact shape of the light curve due to the 
phase randomization, in order to make a direct comparison 
between the clean light curve and filtered light curve, we 
have to record the phase information of the initial data as 
the input of the algorithm instead of using random phases. 
We do this here because our aim is simply to check the effect 
of Poisson noise filtering for the synthetic algorithm. 

The results are shown in Fig. 1111 We can see how the 
Poisson noise is removed in the filtered light curve. A quan- 
titive measurement is the standard deviation, which is 320, 
420 and 340 for the clean, dirty and filtered light curves re- 
spectively. The excess variation of the filtered light curve 
at high frequencies is very probably caused by the distor- 
tion introduced by the exponential transform, which tends 
to exaggerate the positive variation, e.g. the amplitudes of 
the flares. The conclusion is that the synthetic light curve 
created by our algorithm can be considered essentially free 
of Poisson noise. 
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Figure 10. Comparison of the simulation with the algorithm described in section [3.11 (with additional Poisson noise) and the real data 
(ObsID 10238-01-05-000) for the energy bin 5.5-6.8 keV. Top: one segment (1024 points) of light curve of the real data (left) and the 
simulation (right). Middle: the PDS (left) and the time lag spectra (right) of the simulation (dot for positive values and asterisk for 
negative values) and the real data (solid line). Bottom: the rms-flux relation (left) and flux distribution (right) of the simulation (dot) 
and the real data (solid line). The error bars of the simulation data are plotted. 




Figure 11. Top: the initial clean light curve and its PDS. Middle: the light curve with additional Poisson noise and its PDS. Bottom: 
the light curve filtered by our algorithm and its PDS. The dashed line in the right panels is the PDS of the initial clean light curve. 
The PDS are calculated from 50 segments, each with a length of 1024 points. Each of the left panels shows only one segment of the 
corresponding light curve. 
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It is worthy to point out that if the phases were known, 
we could reconstruct the time series strictly by inverse 
Fourier transform, because this time-to-frequency transition 
is completely reversible. The exponential transform would 
be not be necessary. We therefore face an interesting prob- 
lem: if the phase is known, the exponential is redundant for 
reconstructing the initial time series, which is completely de- 
fined by the fourier spectrum; however if we know nothing 
about the phases and assume them to be random, in order 
to reproduce a time series satisfactorily we have to apply the 
exponential transformation. We will discuss this problem in 
section 13.61 

The above process of subtracting the Poisson noise is 
similar to Wiener filtering. The Wiener filter is the optimal 
filter in the least-square sense for the removal of noise from 
a time domain signal. The Wiener filter is desi gned in th e 
Fourier domain and can be expressed as (jPress et al.l ll992): 



*(/) = 



|5(/)| 2 



\S(f)\ 2 + \N(fW 
in which S and TV are the Fourier transforms of the in- 
trinsic signal and the noise, respectively. The denominator 
\S(f)\ 2 + \ N (f)\ 2 is proportional to the PDS of the measured 
light curve (under the assumption that signal and noise are 
statistically independent). The filter can be constructed if 
the true form of the intrinsic power |.S(/)| 2 is known or can 
be estimated well. Our algorithm and the Wiener filter share 
some common ideas — to separate noise and signal in the 
frequency domain, which can not be done in the time do- 
main. We fixed the noise power to 2 and applied a Wiener 
filter to the same data in Fig. 1111 The difference between 
the Wiener filtering solution and ours is shown in Fig. 1121 
The filtering effect are generally similar for the two methods. 
If the PDS used in the Wiener filtering and our algorithm 
are those averaged over many segments (panel b and d in 
Fig. 1121) . our algorithm appears to preserve more short time- 
scale fluctuations, which makes it more similar to the origi- 
nal one. If the PDS of the single segment shown in the figure 
is used in designing the Wiener filter (panel c in Fig. I12[) , its 
solution would be much more noisy than the original light 
curve. From this perspective, our algorithm seems to pro- 
vide a more stable filter as it is able to subtract the noise 
and at the same time to avoid wiping off too much rapid 
variability. This is probably due to the exponential trans- 
formation that can restore the fluctuation amplitude to a 
certain extent. The price of the exponential transformation 
is that the exact shape of light-curve is slightly distorted. 



3.4 Dynamic Energy Spectra 

If the photon count in one bin of the energy spectrum is TV, 
the relative standard deviation expected from the Poisson 
distribution is 1/vTV. When we want to obtain the aver- 
age energy spectra with high statistical significance, we need 
a sufficiently long exposure (hundreds to thousands of sec- 
onds) to accumulate enough photons. It is important to see 
how the energy spectra evolve on short time-scales which, 
however, in this way is not possible. The energy spectra on 
short time-scales, which we name dynamic energy spectra, 
can be obtained by aligning the light curves of different en- 
ergy bins in time and obtaining the counts in every time bin 
as a function of energy. The problem of this analysis from 




Figure 12. (a): the initial clean light curve (one segment, the 
same as in the left top panel of Fig. lilt . This is used as the 
input for both Wiener filtering and our algorithm, after adding 
Poissonian noise (see text), (b): the Wiener filtering solution with 
the averaged PDS over many segments, (c): the Wiener filtering 
solution with the PDS of the single segment, (d): the solution of 
our algorithm (the same as in the left bottom panel of Fig, lilt . 



the real data is that the Poisson fluctuation is severe in this 
case due to the small number of counts. The simulated light 
curve obtained with our synthetic algorithm, as presented 
in the last section, does not include Poisson noise. The dy- 
namic energy spectrum produced from the simulated light 
curves is therefore "cleaner" and can reveal the underlying 
properties otherwise hidden by noise. 

The dynamic energy spectra with 8 energy bins were 
calculated for both real data and simulation at three time- 
scales (or time bin sizes), 0.0625 s, 0.25 s and 1 s, and 
then fitted with XSPEC using the PCA detector response 
matrix. Notice that the synthetic data, having reproduced 
the background-subtracted energy spectrum, are also back- 
ground free. The lowest energy bin (0.14-3.4 keV) is excluded 
due to the uncertainty in the PCA calibration below 3 keV 
and a simple power law is fitted to each of the dynamic 
energy spectra. There are in total 25600, 6400 and 1600 
dynamic energy spectra that were fitted for the three time- 
scales, respectively. The power law photon index F is the 
parameter that we studied to reflect the basic shape of dy- 
namic energy spectra. For a comparison of real data and 
simulation, we plot the time evolution of Y as well as the flux 
covering the whole energy band (0.14-25 keV) (Fig. ll3p . the 
correlation between Y and flux (Fig. I14[) and the histogram 
of r (Fig. I15[) . The noise-free reconstructed data providesa 
"cleaner" view of the F distribution and the F-flux corre- 
lation. One possible reason for the improved correlation is 
that the simulation does not include any incoherent vari- 
ations that may weaken the correlation for the real data. 
However, we have shown that the coherence in the real data 
is very close to unity for most of the frequencies considered 
here and therefore this possibility can be excluded. 

Combining the above results we can conclude that: 
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Figure 13. The 0.14-25 keV light curve (dashed line) and the corresponding power law photon index T evolution (solid line) for (a) real 
data and (b) simulation (without Poisson noise), with time bin sizes of 0.0625 s, 0.25 s and 1 s from top to bottom. 
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Figure 14. The correlation between power law photon index T 

and flux at three time-scales (or time bin sizes) of 0.0625 s, 0.25 s 
and 1 s for real data (left column) and simulation (right column) . 



Figure 15. The histogram of power law photon index F at three 
time-scales (or time bin sizes) of 0.0625 s, 0.25 s and 1 s for real 
data (left column) and simulation (right column). 



(ii) The distribution of F is narrower for the simulation 
(i) The correlation between F and flux is somewhat higher than for the real data, 

for the simulation than for the real data. (hi) The above differences between simulation and real 
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data tend to increase at shorter time-scales, i.e. for lower 
photon count numbers. 

The correlation between V and flux is consistent with 
the previous results that t he hardness rat i o anti-correlates 
with the X-ray fl ux (e.g. ICui et al.l |2002| ; iLiu fc Lil |2004 
IWilms et al.ll2006l ) or that the photon index correlates simi- 
larly with the flux on time-scale of days (e.g. Zdziarsk i et al.l 
|2002| ; IPottschmidt et alJlJOO^ ; iGierlifiski fc Zdziarskill2003h . 



The significantly better correlation for simulated data 
(Fig. ll4[) shows that our algorithm enables us to significantly 
reduce the effects of Poisson noise and study the intrinsic 
spectral evolution at short time-scales. The Poisson noise 
also broadens significantly the distribution of T (Fig. I15p . 
However, we need to be cautious to claim that the broaden- 
ing of the r distribution from the simulation (as shown in 
the right column in the Fig. I15J1 is completely caused by the 
intrinsic short-time-scale spectral fluctuation. It might also 
be introduced by our algorithm, which we investigate next. 

In order to test the effects of the algorithm on the re- 
covery of r values, we produced a new synthetic dataset by 
using all information described above with the exception of 
the F values, which were fixed at the single value of 1.69. 
In other words, we produced a set of light curves with null 
time/phase lags, which were therefore identical except for 
their normalizations. We added Poisson noise and studied 
the output P distribution after applying our algorithm. The 
histograms of T for the zero-lag synthetic data, the values 
obtained through direct spectral fitting and those from our 
reconstruction are plotted in Fig. 1161 From our test, we can 
derive that the systematic broadening of the V distribution 
introduced by our procedure is very limited (less than 0.004 
or 0.24 per cent) (notice the horizon scale of the right panel 
in Fig. [To]) . 

The observed P distribution shown in the left column 
of Fig. [TS] is the result of both the intrinsic distribution in 
F and the broadening due to noise. Thus, the standard de- 
viation or of the intrinsic spread in P can be quantitatively 
estimated as: 



or 



= Vo 



(e 2 ) 



where o"J ata is the variance of the T distribution observed 
from real data (left column of Fig. ll5|) . and (e 2 ) is the mean 
square error on P obtained from the spectral fitting analysis. 
For the shortest time-scale 0.0625 s, we have obtained 25600 
values of P from fitting the dynamic energy spectra, obtain- 
ing "data = 0.0141, (e 2 ) = 0.0113. CTdata is approximately 
equal to (e 2 ), which proves that the spread in P at short 
time-scale from the real data is mostly due to the Poisson 
noise. With the above equation we obtain or = 0.0529, close 
to the standard deviation of 0.0517 calculated from the sim- 
ulation data (the top right panel in Fig. I15p . For the longer 
time-scales, the two values are also found to be comparable. 
Again this fact supports that our algorithm is capable of 
filtering the Poisson noise and reveal t he intrinsic di s tribu - 
tion of r. It is interesting to notice that lKotov et al.l (J2001J) 
assumed the small variations of the power law index in their 
time-lag model. Our study on the intrinsic T distribution 
thereby gives an evidence for their assumption. 
We can therefore draw three conclusions: 

(i) the broadening of T distribution introduced by our 
algorithm can be neglected and the histograms on the right 



panel of Fig. [15] reflect the intrinsic spectral variation of the 
source on these time-scales; 

(ii) our synthetic method is indeed powerful in filtering 
Poisson noise and recovering underlying statistical proper- 
ties of the source data which are masked by Poisson noise; 

(iii) the T distribution obtained through short-time spec- 
tral fitting of the data is completely dominated by the effects 
of Poisson noise and cannot be used to ascertain the real dis- 
tribution. 

A different appr oach was followed studied by 
iRevnivtsev et al.l (| 19991 ') through Fourier-resolved spectra, 
which give the energy-dependent variability amplitude in a 
certain frequency range. The frequency dependent spectral 
variability revealed by the Fourier-resolved spectra, how- 
ever, cannot be immediately linked with the variation of 
spectral indices studied here. First, the time-scale in our 
work refers to the time bin size other than the reciprocal 
of frequency. The variations sampled on short time bins 
come from both low-frequency and high-frequency variabil- 
ities presented in PDS. However, the power density or vari- 
ability calculated by binning the time serie s is comparable 
to that on the corresponding frequency fsee lWu et al.ll2009l . 
and refernces therein), and the time-scale can be taken in 
practice as the time bin size. What is more important is that 
although Fourier-resolved spectra have a form similar to en- 
ergy spectra, they have a totally different physical interpre- 
tation. Therefore, the comparison of their spectral indices 
is not useful. For example, the Fourier-resolved spectrum 
was found to be harder at higher frequencies, which only 
suggests that the hard X-ray radiation component exhibits 
larger variation amplitude compared with the soft radiation 
as frequency increases. The only possible connection we can 
seek between this phenomenon and our study on the energy 
spectral indices is that the variations of V on short time-scale 
are probably mainly due to the hard spectral component. 



3.5 Extrapolation and Interpolation 

The synthetic algorithm can be used to produce the data 
with better time and energy resolution than the real data, af- 
ter some additional hypotheses are made. A two-Lorentzian 
model has been used to describe the PDS and lag spectra 
(see section [2.21 and | 2.3[) . If we extrapolate the model fre- 
quency beyond the Nyquist frequency of the original data, 
we are able to derive a synthetic light curve with higher 
time resolution. Moreover, we have derived the energy de- 
pendence of the best-fitting parameters for the PDS and lag 
spectra. By interpolating these functions, we can obtain the 
PDS and lag spectra (and therefore produce synthetic data) 
on an energy grid finer than the initial energy resolution. 
The hypothesis on which the extrapolation and interpola- 
tion are based is of course that PDS and time lag evolve 
smoothly in frequency and energy and that they can be ex- 
trapolated from the observed values. 

The results of the extrapolation to higher frequency and 
of the interpolation between energies are shown in Figs. \TT\ 
and [18] respectively. The time resolutions in the two panels 
of Fig. [17] are 0.004 and 0.001 s, smaller than the time res- 
olution of 0.015625 s for the real data. The dynamic energy 
spectra were also studied on these time-scales and the re- 
sulting power law photon indices are plotted in the figure. 
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Figure 16. The histograms of power low photon index F at 0.0625 s for data with T distributed as a S function at 1.69 (left), the same 
data with added Poisson noise (middle) and the reconstructed data with our synthetic algorithm (right). 
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Figure 18. The real and simulated light curves in two narrow en- 
ergy bins of 4.34-4.64 keV (upper panel) and 11—11.6 keV (lower 
panel). The simulation is based on the interpolation of the best- 
fitting parameters for PDS and time lag spectra between energy 
bins. 



Poisson noise is not added to the light curve and an explicit 
positive correlation between photon index and flux can be 
observed. The best-fitting parameters of PDS and lag spec- 
tra were interpolated to two additional narrow energy bins, 
4.34-4.64 keV and 11-11.6 keV (see open circles in Fig. [5] 
and Fig. [7J. As mentioned, the original data have 64 energy 
bins, which we rebinned into 8 coarse energy bins as the 
input of our algorithm. This allowed us to compare the sim- 
ulation and the real data in these two narrow energy bins, 
as shown in Fig. 1181 



3.6 Phase and Noise 



In section 13.31 we wondered whether an exponentiation is 
necessary after the inverse Fourier transform to produce a 
light curve similar to the real one when random phases are 
assumed. This is peculiar since we can strictly recover the 
time series with the inverse Fourier transform alone, if we 
know the phases. It is therefore logical to deduce that the 
phases of the real data cannot be random. The exponential 
transformation is simply a compensation for the incorrect 
assumption of random phases. The effect of the exponential 
transformation in the time domain should be represented in 



the Fourier domain as a modification of phase, since there 
is practically no effect on the PDS shape. The fact that the 
phase cannot be totally random and independent can also 
be revealed by higher order variability properties, e.g. the 
bicoherence, a higher order statistics measuring the degree 
of coupling between variations on different time-scales. A 
non-zero bicoherence indicates that there exist correlations 
between the phases at different frequencies within a singl e 
energy band (JMaccarone fc CoppilExii : luttlev et al.ll2005h . 
Hence the phases cannot be independent, nor can they be 
strictly uncorrelated. From Fig. [9] the phases appear to be 
uniformly distributed over the range (— 7r,7r] and a correla- 
tion between them is not apparent. A higher order statistical 
test such as bicoherence would probably show these effects. 
The assumption about the phase distribution presented in 
section 13.31 can be considered as a lower-order approxima- 
tion, and is appropriate for the practical purpose of simulat- 
ing the linear time series which will later be exponentially 
transformed. 



Moreover, it is possible that the procedure of segment- 
ing the data conceals the underlying phase. In order to in- 
vestigate this possibility, we performed this experiment (see 
Fig. I19[) . With an arbitrary PDS and zero lag between en- 
ergies, we synthesized an initial light curve. Its time lag is 
of course zero at all frequencies. If we add Poisson noise to 
the light curve, the phases (calculated from the FFT) would 
be scattered around zero. In practice the start time of the 
observation is arbitrary and the long light curve is split into 
short segments to calculate the phase. We chose an arbi- 
trary starting point and calculated the FFT in 1024s-long 
segments. The phases at different frequencies for a single 
segment, and the phases at different segments for a sin- 
gle frequency apparently deviate from zero. If we again add 
Poisson noise, the phase becomes totally random. Therefore, 
even if the intrinsic phase is not random or uncorrelated, 
the arbitrary selection of a starting point and the presence 
of additional Poisson noise would make the detected phase 
appear random. The underlying phase is likely not random, 
although this cannot be inferred directly by the data. Up 
to now we still know little about the intrinsic distribution 
of the phases and cannot propose a hypothesis more rea- 
sonable than the random distribution. Therefore, we stick 
to the assumption of random phases uniformly distributed 
between (— n, it] throughout this work. 
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Figure 17. The time evolution of power law photon index T (solid line) and 0.14-25 keV net flux (dashed line) derived from the 
simulation based on the extrapolation of PDS and time lag spectra to higher frequency. The time-scales of 0.004 s (upper) and 0.001 s 
(lower) are shorter than the time resolution of the original data. 
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Figure 19. Top: the phases of the initial light curve with (left) and without (right) additional Poisson noise. Middle: the phases at 
different frequencies for a single segment after arbitrarily choosing the starting point and segment (left) and the case with additional 
Poisson noise (right). Bottom: the phase for different segments for a fixed frequency after arbitrarily choosing the starting point and 
segment (left) and the case with additional Poisson noise (right). 



4 CONCLUSIONS 

We have developed an algorithm to produce synthetic light 
curves from the observed properties of Cyg X-l. The algo- 
rithm is based on the previously established time series sim- 
ulatio n method (e.g. iTimmer fc Konia Il995l ; lUttlev et al.l 
I2005T ) and improved in three aspects: a) besides the PDS, 
the information from additional measurements such as en- 
ergy spectra and lag spectra is used as input; b) the syn- 
thetic time series is required to reproduce almost all the 
timing and spectral properties; c) the possible applications 
of the method are explored, including filtering Poisson noise 
and data extrapolation. It is model independent, unlike 



the attempts to restore the timing and spectral properties 
through physically inte resting model and parameters (e.g. 
lArevalo fc Uttlevll2006r ). The simulation do not provide in- 
formation that are not already contained in the original 
PDS, lag spectra and etc. What we do is to allow a different 
view of the same data. 

By combining all known information about the observed 
variability, a reasonable assumption on the distribution of 
phases, and prior knowledges about the Poisson noise power, 
we can obtain synthetic data which are not affected by Pois- 
son noise. From these synthetic data, we can explore the 
spectral variability of the source on short time-scales, where 



14 Y. X. Wu, T. M. Belloni and L. Stella 



the real data are noise-dominated. We showed that the ob- 
served distribution of spectral indices of Cyg X-l on short 
time-scales is completely dominated by Poisson effects, as 
even simulated data with a 8 distribution in spectral indices 
yields the same output distribution. From the output of our 
algorithm, we have recovered the real underlying distribu- 
tion, under a relatively small number of assumptions. Our 
method shows that our current data are sufficient to repro- 
duce the observed properties with good accuracy. Future 
missions will yield much higher statistics and will allow to 
explore spectral variability at higher frequencies and with 
better spectral resolution. 
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